Conference Proceedings
Web Page Template and Data Separation for Better Maintainability
C Zhao, R Zhang, J Qi
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | SpringerLink | Published : 2018
Abstract
© 2018, Springer Nature Switzerland AG. Separating a web page into template code and data records populated into the template is an important problem. This problem has a wide range of applications in web page compression and information extraction. We study this problem with the aim to separate a web page into easily maintainable template code and data records. We show that this problem is NP-hard. We then propose a heuristic algorithm to solve the problem. The main idea of our algorithm is to parse a web page into a tree and then to process it recursively in a bottom-up manner with three steps: splitting, folding, and alignment. We perform experiments on real datasets to evaluate the perfor..
View full abstractGrants
Awarded by Australian Research Council
Funding Acknowledgements
This work is supported by Australian Research Council (ARC) Future Fellowships Project FT120100832 and Discovery Project DP180102050.